Coursera Capstone Project: "The Battle of Neighborhood"

This is the final project in the IBM Data Science course in Coursera. In this project the student should realize an own idea with geo location information.

Introduction

This notebook should be searched the best neighborhood of Toronto to live as a bicyclist. For this reason it will be used the data from the last week to explore the location in Toronto. To get bicycle related data it should be used Openstreetmap and Foursquare to scrap data.

Data

The first step is to determine which information can be used:

  • length of bicycle lanes
  • amount of public parks
  • amount of bicycle workshops, shops for spare parts
  • amount of destinations for cyclist

Methodology

With this data I will cluster the neighborhoods in three clusters. The three clusters are a rating for:

  1. Very good location for bicycle fans
  2. Good location for bicycle fans
  3. No location for bicycle fans

Getting data

The data should be got from Open Street Map (OSM) and Foursquare. OSM is an open source project and there are several APIs to get information. Foursquare is a commercial enterprise which delivered location based data. It will be used the free and limited developer access.

1. Get borough locations

In the first step I try to get the outer bounds of the boroughs of Toronto. I will do this with Open Street Map because it is free and there is a well documented API. At first I will install some software to download data from OSM. These are the OSMPythonTools:

  • Overpass is a tool make queries on the Overpass
  • Nominatim is to geo codes from OSM
  • there are other tools included which are not used in this notebook
In [256]:
!pip3 install OSMPythonTools
Requirement already satisfied: OSMPythonTools in d:\programme\anaconda3\lib\site-packages (0.2.9)
Requirement already satisfied: matplotlib in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (3.2.2)
Requirement already satisfied: beautifulsoup4 in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (4.9.1)
Requirement already satisfied: ujson in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (1.35)
Requirement already satisfied: pandas in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (1.0.5)
Requirement already satisfied: pytest-sugar in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (0.9.4)
Requirement already satisfied: geojson in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (2.5.0)
Requirement already satisfied: xarray in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (0.16.1)
Requirement already satisfied: pytest in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (5.4.3)
Requirement already satisfied: lxml in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (4.5.2)
Requirement already satisfied: numpy in d:\programme\anaconda3\lib\site-packages (from OSMPythonTools) (1.18.5)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\programme\anaconda3\lib\site-packages (from matplotlib->OSMPythonTools) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in d:\programme\anaconda3\lib\site-packages (from matplotlib->OSMPythonTools) (1.2.0)
Requirement already satisfied: python-dateutil>=2.1 in d:\programme\anaconda3\lib\site-packages (from matplotlib->OSMPythonTools) (2.8.1)
Requirement already satisfied: cycler>=0.10 in d:\programme\anaconda3\lib\site-packages (from matplotlib->OSMPythonTools) (0.10.0)
Requirement already satisfied: soupsieve>1.2 in d:\programme\anaconda3\lib\site-packages (from beautifulsoup4->OSMPythonTools) (2.0.1)
Requirement already satisfied: pytz>=2017.2 in d:\programme\anaconda3\lib\site-packages (from pandas->OSMPythonTools) (2020.1)
Requirement already satisfied: packaging>=14.1 in d:\programme\anaconda3\lib\site-packages (from pytest-sugar->OSMPythonTools) (20.4)
Requirement already satisfied: termcolor>=1.1.0 in d:\programme\anaconda3\lib\site-packages (from pytest-sugar->OSMPythonTools) (1.1.0)
Requirement already satisfied: setuptools>=38.4 in d:\programme\anaconda3\lib\site-packages (from xarray->OSMPythonTools) (49.2.0.post20200714)
Requirement already satisfied: py>=1.5.0 in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (1.9.0)
Requirement already satisfied: attrs>=17.4.0 in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (19.3.0)
Requirement already satisfied: more-itertools>=4.0.0 in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (8.4.0)
Requirement already satisfied: pluggy<1.0,>=0.12 in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (0.13.1)
Requirement already satisfied: wcwidth in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (0.2.5)
Requirement already satisfied: atomicwrites>=1.0 in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (1.4.0)
Requirement already satisfied: colorama in d:\programme\anaconda3\lib\site-packages (from pytest->OSMPythonTools) (0.4.3)
Requirement already satisfied: six>=1.5 in d:\programme\anaconda3\lib\site-packages (from python-dateutil>=2.1->matplotlib->OSMPythonTools) (1.15.0)

Here are the libraries which will be used in the notebook:

In [257]:
import pandas as pd
import numpy as np
import json
import copy
from OSMPythonTools.overpass import Overpass
ovp = Overpass()
from OSMPythonTools.api import Api
api = Api()
import folium # map rendering library
import requests # library to handle requests
# import k-means from clustering stage
from sklearn.cluster import KMeans

CLIENT_ID = 'T2ZKTOONYGDDUG1R1VZOYG0T1CKLK3F5Q1DF1W0OPX1Q4EIP' # your Foursquare ID
CLIENT_SECRET = 'V5APV4N12QPNOEE1JTSIRGYV0DCOMOTR5F4KTPOTCVM4TLQ5' # your Foursquare Secret
VERSION = '20201113'
LIMIT = 500

Sorting the Data

To show the bounderies of the boroughs correct, it is needed to sort them after download.

In [258]:
def SortNode(ways):
    index = 0
    sortidx = 0
    while sortidx < (len(ways) - 1):
        ways_idx = ways[sortidx + 1].copy()
        ways_idx_rev = ways[sortidx + 1].copy()
        ways_idx_rev.reverse()
        #print(f'sortidx: {sortidx}\r\n')   
        #print(f'ways[{sortidx}]: {ways[sortidx]}\r\nways[{sortidx + 1}]: {ways[sortidx + 1]}\r\n')
        #print(f'ways_idx: {ways_idx}\r\nways_idx_rev: {ways_idx_rev}\r\n')
        #print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx[0]: {ways_idx[0]}\r\n')
        #print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx_rev[0]: {ways_idx_rev[0]}\r\n')
        if ways[sortidx][-1] == ways_idx[0]:
            sortidx = sortidx + 1
        elif ways[sortidx][-1] == ways_idx_rev[0]:
            ways[sortidx + 1] = ways_idx_rev
            sortidx = sortidx + 1
        else:
            index = sortidx + 2
            if index >= len(ways):
                index = 0
            while index != 1:
                ways_idx = ways[index].copy()
                ways_idx_rev = ways[index].copy()
                ways_idx_rev.reverse()
                #print(f'index: {index}\r\n')
                #print(f'ways[{sortidx}]: {ways[sortidx]}\r\nways[{index}]: {ways[index]}\r\n')
                #print(f'ways_idx: {ways_idx}\r\n')
                #print(f'ways_idx_rev: {ways_idx_rev}\r\n')
                #print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx[0]: {ways_idx[0]}\r\n')
                #print(f'ways[{sortidx}][-1]: {ways[sortidx][-1]}\r\nways_idx_rev[0]: {ways_idx_rev[0]}\r\n')
                if ways[sortidx][-1] == ways_idx[0]:
                    ways.pop(index);
                    ways.insert(sortidx + 1, ways_idx)
                    sortidx = 0
                    index = 1
                    break
                elif ways[sortidx][-1] == ways_idx_rev[0]:
                    ways.pop(index);
                    ways.insert(sortidx + 1, ways_idx_rev)
                    sortidx = 0
                    index = 1
                    break
                else:
                    index = index + 1
                    if index >= len(ways):
                        index = 0
            sortidx = sortidx + 1

    #print(f'ways[0]: {ways[0]}\r\nways[1]: {ways[1]}\r\n')
    #print(f'ways after: {ways}\r\n')
    return ways

def OverpassQuery(query):
    result_w = ovp.query(query, timeout=100)
    res_json = result_w.toJSON()
    res_elements = res_json['elements']
    #print(f'res_json:\r\n{res_json}\r\n')
    #print(f'type(res_json):\r\n{type(res_json)}\r\n')
    #print(f'res_json.keys:\r\n{res_json.keys()}\r\n')
    return res_elements

For the OSM API I have to create queries for the different borughs, These are string concatenations with area name of OSM and keywords or tags to filter the return objects. From OSM you get nodes, ways or relations. To describe the bounderies OSM used ways which exist of many nodes. I downloaded the nodes for every borough, sorted the nodes and show it in a Folium map.

In [259]:
borough = ["Scarborough", "North York", "Old Toronto", "Etobicoke", "York", "East York"]

Scarborough_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[0]}"](area);way(r);out qt;'
North_York_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[1]}"](area);way(r);out qt;'
Old_Toronto_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[2]}"](area);way(r);out qt;'
Etobicoke_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[3]}"](area);way(r);out qt;'
York_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[4]}"](area);way(r);out qt;'
East_Yorkh_query = f'area[name="Toronto"];relation["boundary"="administrative"]["name"="{borough[5]}"](area);way(r);out qt;'

queries = [Scarborough_query, North_York_query, Old_Toronto_query, Etobicoke_query, York_query, East_Yorkh_query]

overpass_response = []
for x in queries:
    tmp = OverpassQuery(x).copy()
    overpass_response.append(tmp)

ways = []
for y in overpass_response:
    ways_inner = []
    for x in y:
        ways_inner.append(x['nodes'])

    ways.append(ways_inner)



for x in ways:
    x = SortNode(x)

ways_coordinates = []
node_list = []
bound_lat_max = -360
bound_lat_min = 360
bound_lon_max = -360
bound_lon_min = 360
for bor in ways:
    ways_coordinates_bor = []
    node_list_inner = []
    for w in bor:        
        ways_coordinates_inner = []
        for x in w:
            result_ways = api.query(f'node/{x}')
            ways_coordinates_inner.append([result_ways.lat(),result_ways.lon()])
            node_list_inner.append([result_ways.lat(),result_ways.lon()])
            if result_ways.lat() > bound_lat_max:
                bound_lat_max = result_ways.lat()
            if result_ways.lat() < bound_lat_min:
                bound_lat_min = result_ways.lat()
            if result_ways.lon() > bound_lon_max:
                bound_lon_max = result_ways.lon()
            if result_ways.lon() < bound_lon_min:
                bound_lon_min = result_ways.lon()
        ways_coordinates_bor.append(ways_coordinates_inner)
    node_list.append(node_list_inner)

    ways_coordinates.append(ways_coordinates_bor)

print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')

node_list_reverse = []

for x in node_list:
    tmp_inner = []
    for i in x:
        tmp_inner.append([i[1], i[0]])
    node_list_reverse.append(tmp_inner)
bound_lat_min: 43.5802533, bound_lon_min: -79.6392727, bound_lat_max: 43.8554425, bound_lon_max: -79.1132193

Visualize the boroughs in Folium

To dexcribe the doing better it helps to view the boroughs in a map. The map are colored withe bounderies and areas of the boroughs. The bounderies are used to get the points to compare the boroughs in respect to bicycle interesting topics.

In [260]:
latitude = 43.7134408
longitude = -79.541716

map_select = folium.Map(location=[latitude, longitude], zoom_start=10)
color_map = ['red', 'green', 'blue', 'yellow', 'violet', 'black']

for x, col in zip(node_list, color_map):
    folium.PolyLine(x, fill = True, fill_color=col, fill_opacity =0.6).add_to(map_select)

print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')
map_select.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
map_select
bound_lat_min: 43.5802533, bound_lon_min: -79.6392727, bound_lat_max: 43.8554425, bound_lon_max: -79.1132193
Out[260]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Boroughs of Toronto:

Borough Color
Scarborough Red
North York green
Old Toronto Blue
Etobicoke Yellow
York Violet
East York Black

2. Get bicycle lanes and tracks

To measure the length of possible biking kilometers or miles we extract the bicycle lanes and track positions from OSM and calculate it length in the boroughs. At first let's look at the bicycle tracks of every borough.

In [261]:
node_list_string = []
for n in node_list:
    node_tmp = f''
    for x in n:
        node_tmp = node_tmp + ' ' + str(x[0]) + ' ' + str(x[1])
        
    node_list_string.append(node_tmp)
#print(node_list_string[2])
borough_ways_geo = []
for b in node_list_string:
    query = f'(way["cycleway"](poly:"{b}");way["bicycle"="yes"](poly:"{b}");\
        way["segregated"](poly:"{b}");way["highway"="cycleway"](poly:"{b}"););\
        out geom qt;'
    
    tmp = OverpassQuery(query)
    
    ways_geo = []
    for w in tmp:
        geo_loc = []
        for n in w["geometry"]:
            #print(f'n: {n["lat"]}')
            geo_loc.append([n["lat"], n["lon"]]) 

        ways_geo.append(geo_loc)
    borough_ways_geo.append(ways_geo)
In [262]:
latitude = 43.7134408
longitude = -79.541716

map_select = folium.Map(location=[latitude, longitude], zoom_start=10, tiles='Openstreetmap',)
color_map = ['red', 'green', 'blue', 'yellow', 'violet', 'black']

for b,c,n in zip(borough_ways_geo, color_map, node_list):
    for x in b:
        folium.PolyLine(x, fill = False, color=c, fill_opacity =0.6).add_to(map_select)
    folium.PolyLine(n, color=c, fill = True, fill_color=c, fill_opacity =0.2).add_to(map_select)
print(f'bound_lat_min: {bound_lat_min}, bound_lon_min: {bound_lon_min}, bound_lat_max: {bound_lat_max}, bound_lon_max: {bound_lon_max}')
map_select.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
map_select
bound_lat_min: 43.5802533, bound_lon_min: -79.6392727, bound_lat_max: 43.8554425, bound_lon_max: -79.1132193
Out[262]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Here you can see the ways in OSM which marked a for bicycle usage. Every borough is shown in a different color. Now let me show the sum of the length of the ways in the map for each borough.

In [263]:
borough_ways_length = pd.DataFrame([], index=borough, columns=["Number", "Length in km"])
for b, i in zip(node_list_string, borough):
    query_length = f'(way["cycleway"](poly:"{b}");way["bicycle"="yes"](poly:"{b}");\
                     way["segregated"](poly:"{b}");way["highway"="cycleway"](poly:"{b}"););\
                     make statistics number=count(ways), length=sum(length());out;'
    
    tmp = OverpassQuery(query_length)
    borough_ways_length.loc[i] = [tmp[0]["tags"]["number"], float(tmp[0]["tags"]["length"]) / 1000]
borough_ways_length
Out[263]:
Number Length in km
Scarborough 423 103.757
North York 572 108.545
Old Toronto 1383 221.019
Etobicoke 533 107.204
York 112 24.0808
East York 193 40.0049

The next step is to show the length of kilometers as choropleth map with Folium. From white to prurple are shown up the absolut length of kilometers. The borough with the fewiest kilometers is near white and the borough with maximum is shown in purple.

In [264]:
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
    geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
    "geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
m = folium.Map(location=[latitude, longitude], zoom_start=10)

folium.Choropleth(
    geo_data=geojson,
    name='choropleth',
    data=borough_ways_length["Length in km"],
    key_on='feature.id',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Kilometers of bicycle tracks',
    bins = 9
).add_to(m)

folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
m
Out[264]:
Make this Notebook Trusted to load map: File -> Trust Notebook

For an alternative I visualize the length in kilometers in relation to the area of every borough. So let me calculate the length divided through the area of each borough.

In [265]:
borough_data = borough_ways_length.transpose()

borough_data.loc["Area"] = [187.7, 176.9, 97.2, 123.9, 23.2, 21.3]
borough_data.loc["Length per square km"] = (borough_data.loc["Length in km"].astype(float) / borough_data.loc["Area"].astype(float)).round(2)
borough_data = borough_data.transpose()
borough_data
Out[265]:
Number Length in km Area Length per square km
Scarborough 423 103.757 187.7 0.55
North York 572 108.545 176.9 0.61
Old Toronto 1383 221.019 97.2 2.27
Etobicoke 533 107.204 123.9 0.87
York 112 24.0808 23.2 1.04
East York 193 40.0049 21.3 1.88
In [266]:
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
    geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
    "geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
    geo_data=geojson,
    name='choropleth',
    data=borough_data["Length per square km"],
    key_on='feature.id',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Length per square km',
    bins = 9
).add_to(m)

folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
m
Out[266]:
Make this Notebook Trusted to load map: File -> Trust Notebook

In this kind of presentation the wider boroughs are lower attractive for bicyclist.

2. Get the amount of parks

To get the amount of parks for each borough I used Foursquare. I created a query for each borough and store the amount of results in a list and added them to the comparing dataframe. The second step was to set the amount in realtion to the area and add a column to the dataframe.

In [267]:
foursquare_query = 'parks'
borough = ["Scarborough", "North York", "Old Toronto", "Etobicoke", "York", "East York"]

foursquare_parks = []
for b in borough:
    near = f'{b}, ON, Kanada'
    url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&near={near}&v={VERSION}&query={foursquare_query}&limit={LIMIT}'
    results = requests.get(url).json()
    foursquare_parks_geo = []
    for x in results["response"]["groups"][0]["items"]:
        #foursquare_parks_geo.append([x["venue"]["location"]["lat"], x["venue"]["location"]["lng"]])
        foursquare_parks_geo.append([x["venue"]["name"]])
    foursquare_parks.append(len(foursquare_parks_geo))
borough_data["Parks"] = foursquare_parks
borough_data["Parks per square km"] = (foursquare_parks / borough_data["Area"].astype(float)).round(2)
borough_data
Out[267]:
Number Length in km Area Length per square km Parks Parks per square km
Scarborough 423 103.757 187.7 0.55 99 0.53
North York 572 108.545 176.9 0.61 100 0.57
Old Toronto 1383 221.019 97.2 2.27 17 0.17
Etobicoke 533 107.204 123.9 0.87 92 0.74
York 112 24.0808 23.2 1.04 100 4.31
East York 193 40.0049 21.3 1.88 35 1.64

In the next two maps there the amount of parks and the amount of parks in relation to the area are presented. From yellow to green the attractiveness of the borough is better.

In [268]:
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
    geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
    "geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
    geo_data=geojson,
    name='choropleth',
    data=borough_data["Parks"],
    key_on='feature.id',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Parks per borough',
    bins = 9
).add_to(m)

folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
m
Out[268]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [269]:
geojson = f'{{"type":"FeatureCollection","features":['
for x, b in zip(node_list_reverse, borough):
    geojson = geojson + f'{{"type":"Feature","id":"{b}","properties":{{"name":"{b}"}},\
    "geometry":{{"type":"Polygon","coordinates":[{x}]}}}},'
geojson = geojson[:-1] + f']}}'
#print(geojson)
m = folium.Map(location=[latitude, longitude], zoom_start=10)
folium.Choropleth(
    geo_data=geojson,
    name='choropleth',
    data=borough_data["Parks per square km"],
    key_on='feature.id',
    fill_color='YlGn',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Parks per square km',
    bins = 9
).add_to(m)

folium.LayerControl().add_to(m)
m.fit_bounds([[bound_lat_min, bound_lon_min], [bound_lat_max, bound_lon_max]]) 
m
Out[269]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The next step is to get bicycle related shops from OSM. We add the resault to the comparing dataframe.

In [270]:
borough_bicycle_shop_counts = []
for b, i in zip(node_list_string, borough):
    query_bicycle_shop_counts = f'(node["shop"="bicycle"](poly:"{b}"););\
        make statistics nwr_count=count(nwr);out;'
    
    tmp = OverpassQuery(query_bicycle_shop_counts)
    #print(tmp)
    borough_bicycle_shop_counts.append(int(tmp[0]["tags"]["nwr_count"]))
borough_data["Shops"] = borough_bicycle_shop_counts
borough_data
Out[270]:
Number Length in km Area Length per square km Parks Parks per square km Shops
Scarborough 423 103.757 187.7 0.55 99 0.53 2
North York 572 108.545 176.9 0.61 100 0.57 2
Old Toronto 1383 221.019 97.2 2.27 17 0.17 48
Etobicoke 533 107.204 123.9 0.87 92 0.74 4
York 112 24.0808 23.2 1.04 100 4.31 2
East York 193 40.0049 21.3 1.88 35 1.64 3

From OSM I get the bicycle related destinations like picnic places, amazing viewpoints or huts to explore.

In [271]:
borough_bicycle_attractions = []
for b, i in zip(node_list_string, borough):
    query_bicycle_attractions = f'(node["tourism"="viewpoint"](poly:"{b}");\
    node["tourism"="picnic_point"](poly:"{b}");node["tourism"="wilderness_hut"](poly:"{b}");\
    node["tourism"="alpine_hut"](poly:"{b}"););\
        make statistics nwr_count=count(nwr);out;'
    
    tmp = OverpassQuery(query_bicycle_attractions)
    #print(tmp)
    borough_bicycle_attractions.append(int(tmp[0]["tags"]["nwr_count"]))
borough_data["Destinations"] = borough_bicycle_attractions
borough_data
Out[271]:
Number Length in km Area Length per square km Parks Parks per square km Shops Destinations
Scarborough 423 103.757 187.7 0.55 99 0.53 2 13
North York 572 108.545 176.9 0.61 100 0.57 2 2
Old Toronto 1383 221.019 97.2 2.27 17 0.17 48 28
Etobicoke 533 107.204 123.9 0.87 92 0.74 4 2
York 112 24.0808 23.2 1.04 100 4.31 2 0
East York 193 40.0049 21.3 1.88 35 1.64 3 2

Results

In this section I create three clusters with help of the generated data. The three clusters determine a rating for locations for bicycle fans.

Cluster 1 is a very good location for bicycle fans

Cluster 2 is a good location for bicycle fans

Cluster 0 are not recommended for bicycle fans

In [272]:
# set number of clusters
kclusters = 3

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(borough_data)

#borough_data.insert(0, 'Cluster Labels', kmeans.labels_)
borough_data["Cluster Labels"] = kmeans.labels_
borough_data
Out[272]:
Number Length in km Area Length per square km Parks Parks per square km Shops Destinations Cluster Labels
Scarborough 423 103.757 187.7 0.55 99 0.53 2 13 2
North York 572 108.545 176.9 0.61 100 0.57 2 2 2
Old Toronto 1383 221.019 97.2 2.27 17 0.17 48 28 1
Etobicoke 533 107.204 123.9 0.87 92 0.74 4 2 2
York 112 24.0808 23.2 1.04 100 4.31 2 0 0
East York 193 40.0049 21.3 1.88 35 1.64 3 2 0

Discussion

The results are strong related to the boroughs of Toronto. This leads to wide location to select a new home. The boroughs are very different in size. So if you select e.g. Scarborough you have more area to explore than Old Toronto. To make the results better it would be a good solution to divide Toronto in segments of equal rectangles or squares and make the same analysis with them.

Conclusion

The purpose of this notebook was to create a help for people who will select a new living location based on bicycle related attributes. I could deliver a table with 3 clusters to select locations to explore for new living places.

In [ ]: